Table structure understanding and its performance evaluation
نویسندگان
چکیده
With the large number of existing documents and the increasing speed in the production of new documents, finding efficient methods to process these documents for their content retrieval and storage becomes critical. Tables are a popular and efficient document element type. Therefore, table structure understanding is an important problem in the document layout analysis field. This paper presents a table structure understanding algorithm using optimization methods. It includes steps of column style labeling, large horizontal blank block equivalence subsets location, statistical refinement, iterative updating optimization and table decomposition. The column style labeling, statistical refinement and iterative updating optimization steps are probability based, where the probabilities are estimated from geometric measurements made on the various entities with which the algorithm works in a large training set. Each step of our table structure understanding algorithm has some tuning parameters. We initially set the parameters with some conjectural values. Then with a global parameter optimization scheme, we update these values using a line search optimization algorithm. We use a performance evaluation protocol employing an area overlapping measure. With this scheme, we can obtain statistically satisfactory tuning parameter values on the fly. Large data sets with ground truth are essential in assessing the performance of a computer vision algorithm. Manually generating document ground truth proved to be very costly and prone to involve subjective errors. We address this problem by using an automatic table ground truth generation system which can efficiently generate a large amount of accurate ground truth suitable for the development of table structure understanding algorithms. This software package is publicly available. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. The algorithm performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level. We implemented and tested two other published table structure understanding algorithms. In the same data set, with the perfect column structure as the input, the other two algorithms performed at the 88.89% and 82.71% accuracy rate on the table level. Comparing with them, our algorithm demonstrated a favorable result.
منابع مشابه
Provide a model for explaining the relationships between performance indicators, errors and employee evaluation methods
This study is practical and its purpose was to present a model based on examining the relationships between three variables: employee performance indicators, human error and employee evaluation methods, respecting the mediating role of methods in the oil refining industry. 281of 1050 official employees of the refining industry were selected by Morgn table for main model and 140 of 220 employees...
متن کاملSeismic Behavior and Dissipated Plastic Energy of Performance-Based-Designed High-Rise Concrete Structures with Considering Soil–Structure Interaction Effect
Since the structure and foundation are built on soil, the soil is the major platform by which seismic vibrations are transmitted to the structure, and has noticeable effects on the response and behavior of structure during earthquakes. In this research, the recently introduced Performance-based plastic design (PBPD) and its modified Performance-based plastic design (MPBPD) method in which soil ...
متن کاملDeterministic Measurement of Reliability and Performance Using Explicit Colored Petri Net in Business Process Execution Language and Eflow
Today there are many techniques for web service compositions. Evaluation of quality parameters has great impact on evaluation of final product. BPEL is one of those techniques that several researches have been done on its evaluation. However, there are few researches on evaluation of QoS in eflow. This research tries to evaluate performance and reliability of eflow and BPEL through mapping them...
متن کاملبررسی نقش میانجی راهبردهای ترکیبی رقابتی و رویکرد مبتنی بر منابع در تأثیر ساختار بر عملکرد سازمانی
Today, companies in the challenging conditions will be successful while acquiring sufficient knowledge and recognition regarding environmental challenges create progress and improvement in their performance. In this regard, the present study has investigated the effect of structure on organizational performance taking into account the intermediary role of competitive-compositional strategies an...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 37 شماره
صفحات -
تاریخ انتشار 2004